Using Options for Long-Horizon Off-Policy Evaluation
نویسندگان
چکیده
Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, we show that the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, to address this long-horizon problem. We show theoretically and experimentally that combining importance sampling with options-based policies can significantly improve performance for longhorizon problems.
منابع مشابه
An Empirical Analysis of Off-policy Learning in Discrete MDPs
Abstract Off-policy evaluation is the problem of evaluating a decision-making policy using data collected under a different behaviour policy. While several methods are available for addressing off-policy evaluation, little work has been done on identifying the best methods. In this paper, we conduct an in-depth comparative study of several off-policy evaluation methods in non-bandit, finite-hor...
متن کاملThe Neural Network Modeling Approach for Long Range Expansion Policy of Power Plant Ccenters
Traditionally, Electrical power plant capacities are determined after specific plant locations have been selected. In this paper an expansion policy of power plant centers involving the choice of regions that must be allocated to power plant centers and power plant centers capacities over a specified planning horizon (years) is tackled. The problem is performed as a mixed integerprogramming mod...
متن کاملPolicy Capacity for Health Reform: Necessary but Insufficient; Comment on “Health Reform Requires Policy Capacity”
Forest and colleagues have persuasively made the case that policy capacity is a fundamental prerequisite to health reform. They offer a comprehensive life-cycle definition of policy capacity and stress that it involves much more than problem identification and option development. I would like to offer a Canadian perspective. If we define health reform as re-orienting the health system from acut...
متن کاملOptimal Capital Structure with Sequential Options and Finite Horizon
A binomial lattice based framework for the analysis of finite investment options with finite operational phase is developed. Solutions for European and American type finite horizon investment options with optimal capital structure and a multi-stage investment setting with multiple debt issues are discussed. The analysis shows that optimal leverage ratios are not affected by option moneyness at ...
متن کاملReview of Nutrition Policy Options for Increasing Fruit and Vegetable Consumption in the Populations: Lesson Learned and Policy Implications
Background: The development of policies for increasing fruit and vegetable consumption is highlighted as a priority in developing countries. This review study aimed to present the available policy options for increasing fruit and vegetable consumption in the populations. Methods: To collect relevant English publications, five electronic databases, including PubMed/Medline, Scopus, Embase, ProQu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.03453 شماره
صفحات -
تاریخ انتشار 2017